language subtag
Naming Languages - bryandragon.com
As part of the Novetta Mission Analytics team, I work on a data pipeline that ingests traditional and social media from around the world, enriches it, and makes the enriched data available to customers. Enrichment can involve any number of steps, many of them powered by machine learning, and one of the earliest and most common steps is translation. When new content arrives, the source language is often unknown and must be detected; if the source language is different from the target language, the content is also translated. In order to translate this volume of content automatically, accurately, and cost effectively, we rely on multiple cloud translation services. To the surprise of no one, cloud translation services differ not only in pricing but also in the languages they support and in the quality of translation across them. It's often most cost effective to perform language detection with one service and, depending on the detected language, translation with another. In addition, these services occasionally use different identifiers to refer to the same language, which requires us to do some mapping on our end.